An overview of {openair}
1 Introduction
An important aspect of R (and Python) is that compared with Excel, new users will find it ‘fussy’ about how it deals with data, and this takes some getting used to. Ultimately though, this is a good thing. Once data are correctly formatted, openair can quickly yield many types of useful analysis — which this document aims to demonstrate.
The most common stumbling block to using openair is getting the data correctly formatted!
Start by loading the openair package.
2 Import one year of data for the London Marylebone roadside site
Importing data in this way also provides hourly estimates of wind speed and wind direction from the CMAQ regional air quality model. These data should be available from part way through 2010 to the present day.
As an aside, it can be very effective to organise air quality access in this way. In the UK, data from 100s of sites can easily be accessed.
It is easy to do this in openair with one short line of code:
london_road <- importAURN(site = "my1", year = 2018)This is a good example of how R code is run in openair functions. In this case, two key bits of information are required — the site code of interest and the year. That’s it.
3 Quick summary of the data
summaryPlot(london_road)openair::summaryPlot().4 Produce a time series of NO2 concentrations
timePlot(london_road,
pollutant = "no2",
ylab = "no2 (ug/m3)")openair::timePlot().openair will try to format common pollutant names and units properly.
timePlot(london_road,
pollutant = "no2",
avg.time = "day")openair::timePlot() timeseries, averaged to the nearest day.5 Plot data in calendar format
Choose some nice colours as well.
calendarPlot(london_road,
pollutant = "no2",
cols = "viridis")openair::calendarPlot().Which days had concentrations > 100 µg m-3? … a more complicated example with additional options included.
calendarPlot(london_road, pollutant = "no2",
breaks = c(0, 100, 500),
labels = c("0 to 100", "> 100"),
cols = c("turquoise4", "deeppink"))openair::calendarPlot(), this time binned into different air quality domains.7 Plot a wind rose
windRose(london_road)openair::windRose().8 Plot a polar plot
polarPlot(london_road,
pollutant = "no2",
col = "plasma")openair::polarPlot().9 Proportion plot
timeProp(london_road,
pollutant = "no2",
proportion = "wd",
avg.time = "week")openair::timeProp().
10 The {openair} type option
Being able to look at the dependencies of pollutant concentrations on other factors is immensely useful. It can be very illuminating to see how a pollutant varies by season, hour of the day, day of the week, cloud cover… and other pollutants etc. Being able to consider these dependencies quickly and efficiently greatly helps analysis and also leads to a more question-led approach and interactive analysis.
However, we don’t want to spend ages processing data! Here’s a quick example:
pollutionRose(london_road,
type = "season",
pollutant = "no2")type option.And a brief summary of in-built types:
- “year” splits data by year
- “month” splits variables by month of the year
- “monthyear” splits data by year and month
-
“season” splits variables by season. Note in this case the user can also supply a
hemisphereoption that can be either “northern” (default) or “southern”, so in Australia / New Zealand you will wanthemisphere = "southern". - “weekday” splits variables by day of the week
- “weekend” splits variables by Saturday, Sunday, weekday
-
“daylight” splits variables by night-time/daytime. Note the user must supply a
longitudeandlatitude - “dst” splits variables by daylight saving time and non-daylight saving time (see manual for more details)
-
“wd” if wind direction (
wd) is availabletype = "wd"will split the data up into 8 sectors: N, NE, E, SE, S, SW, W, NW. -
“seasonyear” (or “yearseason”) will split the data into year-season intervals, keeping the months of a season together. For example, December 2010 is considered as part of winter 2011 (with January and February 2011). This makes it easier to consider contiguous seasons. In contrast,
type = "season"will just split the data into four seasons regardless of the year.
If a categorical variable is present in a data frame e.g. site then that variables can be used directly e.g. type = "site".
type can also be a numeric variable. In this case the numeric variable is split up into 4 quantiles i.e. four partitions containing equal numbers of points. Note the user can supply the option n.levels to indicate how many quantiles to use.
pollutionRose(london_road,
pollutant = "o3",
type = "nox",
grid.line = 10)What’s missing to make this more useful? The availability of surface meteorological data massively increases the types of analysis that can be carried out. We can also easily access surface measurements which will probably be more accurate than modelled data. This is something we will come back to.
Also, what about site metadata such as site classification, pollutants measured etc? That is also something that can be easily accessed in openair.
11 Trends
11.1 Theil-Sen trend estimates
Trends will be considered in more depth in a later session. In this case we will use the longer time series that comes with openair called mydata.
TheilSen(mydata, pollutant = "o3")[1] "Taking bootstrap samples. Please wait."
openair::TheilSen().But we can easily do more. For example, it can be very useful to look at seasonal trends that are also averaged by season:
TheilSen(mydata,
pollutant = "o3",
avg.time = "season",
type = "season")[1] "Taking bootstrap samples. Please wait."
[1] "Taking bootstrap samples. Please wait."
[1] "Taking bootstrap samples. Please wait."
[1] "Taking bootstrap samples. Please wait."
openair::TheilSen().11.2 Non-parametric smooth trends
Often we do not want to fit a linear line through a trend but want to reveal the nature of the variation over time. The smoothTrend() function is useful in this situation.
smoothTrend(mydata, pollutant = "no2")openair::smoothTrend().Trends by wind sector:
smoothTrend(mydata, pollutant = "no2",
type = "wd",
date.breaks = 4)openair::smoothTrend().